surflog: a tool to predict the best surf for you

Athletes have embraced new technologies to improve and enjoy their sporting activities. For example, Strava (https://www.strava.com/) is a mobile app and website designed for runners and cyclers to track their routes. Surfers are no different. While there are many apps geared towards understanding surf conditions (e.g., http://www.surfline.com/, http://magicseaweed.com/), and more recently, sensors to track individual rides (http://www.traceup.com/), there is a clear void in the middle ground.

I view this middle ground as a simple, personalized application that learns from each individuals lifetime surfing experience, and predicts the quality of a surf session based on simple inputs. In other words - I seek to help answer age-old questions for surfers, like - should I go surfing today? Where should I go surfing today? But the answers to these questions are not generic (e.g., from surfing websites), but rather, tailored to the individual.

User inputs

After each session, the surfer inputs a few key metrics about their surf. These would include, for example:

  • Date
  • Time
  • Site
  • Rating (how good was the session; scale of 1-10)
  • Ride count (number of rides)
  • Board (which board did I use)

Ocean conditions

The user inputs are used to determine the conditions for each session, including:

  • Swell height
  • Swell period
  • Swell direction
  • Tide height
  • Wind speed

using the following data sources:

surflog outputs

The user inputs and ocean conditions are used for two purposes:

  1. Visualize surfing trends
  2. Predict the personalized quality of surf, given a set of ocean conditions

I used a dataset of surf sessions over three years in Monterey, California for demonstration. This surf log was then integrated with hourly buoy data from Monterey Bay from 2013-present, as well as hourly tidal heights for Monterey bay over the same period.

I used the following packages in R:

  • Data wrangling: dplyr, readr, tidyr, lubridate, readxl
  • Plotting: ggplot2, cowplot
  • Mapping: ggmap
  • Other: rtide, effects, knitr

2. Predicting the quality of surf sessions

The real benefit of the app will be to inform the user whether a particular site, given the current conditions, will be worth surfing. The statistical model will take the current NOAA, tide, and weather conditions, and integrate them with the historical user data to predict a metric of quality. As a simple starting point, I fit the following generalized linear model, using a poisson distribution because I was modeling a count outcome (the number of rides per session i):

\[Rides_{i} = Poisson(\mu_{i})\] \[E(Rides_{i}) = var(Rides_{i}) = \mu_{i}\]

\[log(\mu_{i}) = \alpha + \beta_{h}Height_{i} + \beta_{p}Period_{i} + \beta_{d}Direction_{i} + \beta_{t}Tide_{i}\]

where Height is swell height (WVHT; m), Period is the dominant swell period (DPD; s), Direction is the mean wave direction (MWD; degrees), and Tide is the tidal height (TideHeight; m).

I fit the model to the central California surf log for each site separately. I can use the model to predict the number of rides as a function of each variable (while holding all others at their median value), and thus visualize the partial effects for one site below:

Figure 3. Partial effects of each predictor from the generalized linear model fit to the surf log data from one site, Asilomar.

Figure 3. Partial effects of each predictor from the generalized linear model fit to the surf log data from one site, Asilomar.

Conclusions

In brief, the model suggests the central California Surfer has the best session (in terms of ride count) when the swell height is small (< 4 ft) and the swell direction is from the northwest (> 300). To a lesser extent, central California surfer surfs more waves at lower tide and when the swell is larger, but these effects are marginal.

This analysis is based on an example surf log, together with hand-entered data for the swell and tide conditions. However, the data scraping will be automated. Moreover, the statistical model can be improved. I foresee the following steps for completing the app over the next three months:

  1. Automate the web scraping of NOAA and tide conditions
  2. Automate the web scraping of weather conditions
  3. Determine the best statistical model (e.g., regression, CART, etc)
  4. Develop the app to be user friendly (i.e., not a spreadsheet and R)
    • for example, create a ranking system of the surfer’s beaches given a set of conditions
    • identify when there is high or low confidence for a given prediction (a function of the input data)